Introduction (AB)
In 1969, Michael Southworth defined the soundscape as “the acoustic environment as perceived by humans, in context” in his paper “The Sonic Environment of Cities” (1969).This definition implies that soundscapes are something which are inherently influenced and characterized by people, and as something that we can study. Since then, the ways in which soundscapes are utilized and defined have evolved drastically. Examining the soundscape of an environment has become a common way to monitor its health, with researchers recording the noise of everything from forests to tundra to ocean to cities (Baquero et al., 2021; Heath et al., 2021; Keen et al., 2022; Rostami & Nansen, 2022; Transue et al., 2023). These soundscapes are often synthesized using categorizations (i.e., biophony, anthrophony, geophony) and acoustic indices that help measure different aspects of sound (i.e., complexity, evenness) (Bradfer-Lawrence et al., 2019).
Banquero et al. examines the soundscape of the ocean, noting that, there was not yet a standardized way to do so (2021). Transue et al. listen to an urban port, exploring both the ocean and its surroundings (2023). Heath et al. examined the soundscape of a tropical terrestrial ecosystem, noting the expense and lack of standardization present in the field that make it difficult to compare studies (2021). Rostami & Nansen also examine terrestrial ecosystem health, but with acoustic transducers, thus demonstrating this lack of standardization (2022). Keen et al. listen to the soundscape of soil, noting how it changes with the presence of invasive worms (2022). The differences and similarities between these studies demonstrate an emerging subfield of ecoacoustics that strives to create a common method for understanding our ecosystems, their viability, characteristics, and wellness, using the sounds that emerge from within them.
Professor Florencia Sangermano at Clark University is applying these concepts to study ecosystem health and biodiversity in Central Massachusetts, using her recordings to create various sound indices for the area (Hanson, 2022). She specifically aims to understand “how human-dominated landscapes affect biodiversity and ecosystems” (Sangermano, 2022) for purposes of conservation, examining factors such as level and type of vegetation as well as human made noise and light (Sangermano, 2022). Sangermano calculated the Acoustic Diversity Index (ADI), Acoustic Complexity Index (ACI) and Normalized Difference Soundscape index (NDSI) with recordings from the dawn chorus (Sangermano, 2022), and found significant relationships between these indices and the aforementioned factors (Sangermano, 2022).
On a different note, the use of R for data analysis and visualization has been on the rise since 2012, especially in academia (Robinson, 2017). Being an open-source programming language, new capabilities are constantly being added, including those for geospatial analysis, and the reproducibility and accessibility of analyses and visualizations using the software are attractive as compared to more expensive and less iterative workflows. Much of ecoacoustic work thus far has had the potential for a geospatial component, but rarely if ever has it been included. And, as discussed above, there is a lack of consistency and standardization in the field. As such, it makes sense to utilize R to bring a geospatial component into ecoacoustic work.
This project will use sound indices developed in Sangermano (2022) to perform regression analysis against key environmental metrics, specifically land surface temperature, nighttime lights, ecological integrity, aboveground biomass density, and canopy height. It will use this regression to create a predictive raster map of ecosystem health as defined by the characteristics of biophonies and anthrophonies. The primary objectives of this project are as follows:
Implement random forest models to predict biophony, anthrophony, and sound complexity in a given area.
Create visualizations of predicted biophony, anthrophony, and sound complexity.
Analyze the relative importance of input factors into the prediction of sound profiles at the recording sites.
Present the results of our research in a way which is easily accessible to the public.
Through completing this analysis, we hope to be able to predict soundscapes for this area and similar ones. Because of the tools that we choose to utilize to do this, others will be able to apply these methods, making it not only relevant but accessible, and able to be utilized for biodiversity conservation in the future, which will be increasingly relevant as climate change and human activity progress and their impacts on ecosystems change.
Below, Figure 1: study sites underlain by Massachusetts town borders and the index of ecological integrity (iei) Below, Figure 2: study sites underlain by nightlights
Methods/Approach (AN)
All of the analysis was completed in R Studio using R 4.2.2
In order to acheive our research objectives, we decided to create three random forest (RF) regression models using the ACI, ADI, and NDSI indices as dependent variables, and nightlights, ecological integrity, canopy height, distance from roads, and land surface temperature as independent variables.
Data Description/ Read in Process
The input data of indices that we received from Prof Sangermano recorded each of the indices over the summer of 2021. In the form we received it, each index was calculated for 1-minute recordings, taken every 14 minutes, from 4 to 7 am. Index values were excluded when precipitation at the point and time of the recording was greater than 1mm per hour. We averaged the three indices by day and by recorder, so each day and recorder would have one index value for each of the three indices, and joined the subsequent tables by recorderid to the shapefile containing the locations of the recorders (see 6_create_main_DF.rmd).
The night lights dataset is an average of night lights for our study area from June to August 2021, from the data product “Visible Infrared Imaging Radiometer Suite (VIIRS) Stray Light Corrected Nighttime Day/Night Band Composites”. This raster has a resolution of 0.004491576 by 0.004491576 degrees, and was downloaded from Google Earth Engine(GEE), clipped to a bounding box around the study area.
The Index of Ecological Integrity (IEI) is a data product from UMass Amherst Conservation Assessment and Prioritization System (CAPS). We chose to use the integrated IEI, which integrates the results of each of each of their specific ecosystem indices. They define ecological integrity as “the ability of an area to support biodiversity and the ecosystem processes necessary to sustain biodiversity over the long term” (UmassAmherst, 2022). This data set is an image of 30m by 30m resolution and was clipped to the boundaries of the study area.
The canopy height data we used is Global Ecosystem Dynamics Investigation (GEDI) level 3 product - gridded canopy height metrics and variability. This raster has a resolution of 1km by 1km, and was clipped to the boundaries of the study area. (see read_in_GEDI&massGIS_data.Rmd)
## Warning in plot.sf(samplesites3, add = TRUE, pch = 16, col = "Red"): ignoring
## all but the first attribute
## Warning in plot.sf(samplesites3, add = TRUE, pch = 16, col = "Red"): ignoring
## all but the first attribute
The road distance raster was created from MassGIS/ MassDOT roads data, which was clipped to the study area. From this, the points of the vertices were extracted, and the distanceFromPoints tool was run to create a distance from roads raster. (see 4_create_road_distance_raster.rmd)
## Warning in plot.sf(samplesites4, add = TRUE, pch = 16, col = "Red"): ignoring
## all but the first attribute
The Land Surface Temperature (LST) data we used was the VIIRS/NPP Land Surface Temperature and Emissivity Daily L3 Global 1 km SIN Grid Night data product, which was averaged by week. These data were imported to R using the RGEE. Persistent NA values across the rasters were filled in using raster’s approxNA function (see 3_read_in_GEE_data.rmd).
Extracting Values
After reading in the data using the process described above, the values were extracted from each raster for the point and time. To extract LST values for each recorder at each week, a week column was added to the observations table, The LST raster brick was (see 5_extract_data_from_rasters.Rmd)
Each of the non-changing rasters (each raster but LST) was fed through an lapply where values were extracted from each recorder location, and then added back to the complete table. The decision was made to reproject the points into the crs of the raster within each permutation of the lapply as to not lose data by reprojecting or resampling the rasters (see 6_create_main_DF.rmd).
Random Forest Model
The table containing the complete information was fed into RF models using the tidymodels package collection. We chose to use the ‘impurity’ variable importance rather than ‘impurity_corrected’ because although impurity corrected is more accurate, it is not recommended when creating RF models used to predict. To validate the models, we split out our training and testing datasets using Joseph’s (2022) recommendations, landing on a 86:14 split. The models were tuned, using mtry values on 1, 3, and 5, and ntree values of 500, 1000, and 2000 to minimize the root mean square(rms) error (see 8.5_RF_Models_Tune.Rmd). This involved the use of tidymodels functions workflow and tunegrid. The values that minimized the rms error were 1 and 500 for the ACI model, and 5 and 500 for both the ADI and NDSI models. These models were then used to predict on the testing datasets. (see 8_RF_Models.Rmd)
We also used ranger RF models to extrapolate to rasters, as parsnip RF models do not work with the raster::predict function. These RF models were run using the paramaters found from the tidymodels based analysis detailed above (see 7_RF_demo.Rmd).
Cross-Correlations in Input Data (DG)
Below, Figure 4.1 through 4.4: all variables plotted against all variables
Figure 4.1: Correlations, histograms, and scatter plots of indices, date, and land surface temperature
Figure 4.2: Correlations, histograms, and scatter plots of aci and independent variables
Figure 4.3: Correlations, histograms, and scatter plots of adi and independent variables
Figure 4.4: Correlations, histograms, and scatter plots of ndsi and independent variables
Results/Worked Examples(AB,AN) Main Results From our random forest model, we determined that different environmental factors have different levels of influence on prediction of the different indices (ACI, ADI, and NDSI). For all three indices, lst was the most important variable, perhaps due to its position as the only temporally changing variable. (Fig 5)
There are strong cross-correlations between independent variables in our analysis. Iei, night lights, lsm, biomass, and road distance all have strong correlation between each pair (P ≤ 0.001). All correlations are in directions that are intuitive. For example, ecological integrity at the sites correlates positively with distance from roads, biomass, and lsm, and correlates negatively with night lights (Fig 2).
Our models performed relatively poorly, with relatively low R^2 values. The ACI model had a particularly low R^2 value, showing that the variables we used accounted for little of the variation observed (Table 1). Visualizations of the predicted vs observed variables are also available below (Fig 6)
Correlations between land surface temperature and the dependent variables are not strong. land surface temperature correlates negatively with ADI and NDSI observations, although future analysis should attempt to control for the effects of time of the year, which is tied closely with biophony, anthrophony, the presence of multiple bird species, and temperature.
Correlations between indices and independent variables are weaker than correlations between independent variables. ADI correlates positively with road distance and iei and negatively with night lights. ACI correlates positively with road distance and negatively with biomass. NDSI shows a strong positive correlation with road distance and a weaker positive correlation with night lights.
Given this, it is difficult to say much about biophony, anthrophony, sound complexity prediction with little significant data. While we are able to predict, it is to be interpreted with an understanding of relative inaccuracy, which will be explained further in the discussion section below.
Below, Figure 5: bar graphs showing relative importance for each
index
Below, Figure 6: graphs showing predicted vs observed variables
Below, Table 1: error and r squared values
## Metric ACI ADI NDSI
## 1 Root Mean Square Error 4.43733933 0.3347448 0.2753065
## 2 R^2 value 0.06859791 0.4893325 0.5683257
## 3 Mean Average Error 2.71332573 0.2426337 0.1918941
Below, Figure 7: Plots of images of ADI, ACI, and NDSI
Below, Figure 8: Example of Displaying predicted indices as RGB
Discussion (AB)
Overall, we found that the predictive capacity of our model is not yet very good, and that correlations between environmental factors are strong, but correlations between environmental factors and sound indices are not.
This could be attributed to a few things. First, the project had a few data limitations. To begin with, we had a combination of temporal and atemporal data, and so it was difficult to make direct and equivalent comparisons between all variables. We somewhat resolved this by taking averages of the temporal data, but this of course excludes some data points and reduces complexity. However, we could not make the atemporal data temporal, and so that comparison/adjustment could not be done the other way around. Furthermore, not all data was from the same time period or equally recent, and some data had gaps due to how it was collected, such as the GEDI land surface metrics data, and so again, comparisons were not totally equivalent. The data gaps are not something that can be solved, but in the future it is possible that equal recency could be. Finally, there were some challenges with data read in and formatting, as well as various code issues over the duration of the project, but they were eventually resolved.
There are also limitations to the model itself. From the statistical measures we calculated, it is clear that there is a lot of error within the model, and so through different methods of model creation this could possibly be solved.
It is hoped that this project will be continued in some capacity. There is more sound index data that could be included for analysis from different times of the year, and different additional factors that could be utilized as variables in the random forest model. We have the opportunity to present our work at the Society for Conservation GIS meeting in July, and potentially move towards something of publishable quality.
Works Cited (All) Baquero, M. P. R., Parcerisas, C., Seger, K. D., Perazio, C., Acosta, N. B., Mesa, F., Luna-Acosta, A., Botteldooren, D., & Debusschere, E. (2021). Comparison of Two Soundscapes: An Opportunity to Assess the Dominance of Biophony Versus Anthropophony. Oceanography, 34, 62–65. Complementary Index.
Bradfer-Lawrence, T., Gardner, N., Bunnefeld, L., Bunnefeld, N., Willis, S. G., & Dent, D. H. (2019). Guidelines for the use of acoustic indices in environmental research. Methods in Ecology and Evolution, 10(10), 1796–1807. https://doi.org/10.1111/2041-210X.13254
Hanson, M. (2022, October 14). The sounds of science. Clark Now | Clark University. https://clarknow.clarku.edu/2022/10/14/the-sounds-of-science/
Heath, B. E., Sethi, S. S., Orme, C. D. L., Ewers, R. M., & Picinali, L. (2021). How index selection, compression, and recording schedule impact the description of ecological soundscapes. Ecology & Evolution (20457758), 11(19), 13206–13217. GreenFILE.
Joseph, V. R. (2022). Optimal ratio for data splitting. Statistical Analysis and Data Mining: The ASA Data Science Journal, 15(4), 531-538.
Keen, S. C., Wackett, A. A., Willenbring, J. K., Yoo, K., Jonsson, H., Clow, T., & Klaminder, J. (2022). Non-native species change the tune of tundra soils: Novel access to soundscapes of the Arctic earthworm invasion. Science of the Total Environment, 838(Part 3). ScienceDirect. http://goddard40.clarku.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=edselp&AN=S004896972203073X&site=eds-live
Robinson, D. (2017, October 10). The Impressive Growth of R. Stack Overflow Blog. https://stackoverflow.blog/2017/10/10/impressive-growth-r/
Rostami, B., & Nansen, C. (2022). Application of active acoustic transducers in monitoring and assessment of terrestrial ecosystem health—A review. Methods in Ecology & Evolution, 13(12), 2682–2691. Complementary Index.
Sangermano, F. (2022). Acoustic diversity of forested landscapes: Relationships to habitat structure and anthropogenic pressure. Landscape and Urban Planning, 226, 104508. https://doi.org/10.1016/j.landurbplan.2022.104508 Southworth, M. (1969). The Sonic Environment of Cities. Environment and Behavior, 1(1), 49–70.https://doi.org/10.1177/001391656900100104
Transue, L., Monczak, A., Tribble, C., Marian, A., Fair, P., Ballenger, J., Balmer, B., & Montie, E. W. (2023). The Biological and Anthropogenic Soundscape of an Urbanized Port—The Charleston Harbor Estuary, South Carolina, USA. PLoS ONE, 18(4), e0283848. Gale In Context: Science. https://doi.org/10.1371/journal.pone.0283848